[00:00.000 --> 00:05.040]  My talk is going to be about cloud host based strategy by staging defensive tools for threat
[00:05.040 --> 00:11.620]  hunting and forensics. Who am I? My name is Michael Mimo. I'm the chief security officer
[00:11.620 --> 00:19.040]  of Copyright Clearance Center. You can follow me on security DevOps on Twitter. I've had
[00:19.040 --> 00:24.900]  various different roles doing cybersecurity in the past. I'd say in the past seven, eight
[00:24.900 --> 00:31.040]  years really focused on the defensive side. All right. Just a quick disclaimer, none of
[00:31.040 --> 00:35.560]  the tools or vendors presented in this are an endorsement. And there are many different
[00:35.560 --> 00:40.620]  tools that you can use to achieve the same goals I'm going to talk about here. And my
[00:40.620 --> 00:46.580]  talk is very much focused on AWS, but I think these concepts could be applied to any cloud
[00:46.580 --> 00:53.320]  provider. All right. So the problem statement. How do you scale at an enterprise level forensics
[00:53.320 --> 00:58.400]  and threat hunting in the cloud, right? So I was thinking about how best to present this.
[00:58.400 --> 01:03.700]  And I think one metaphor that I can use here is every year my son and I go to this gaming
[01:03.700 --> 01:12.100]  convention. And one of the games that we play there is this Axis and Allies at the sea game.
[01:12.180 --> 01:17.680]  And what happens in that game is we simulate the Battle of Midway, right? And the Battle
[01:17.680 --> 01:25.600]  of Midway was a World War II battle between the Japanese and the United States. And a
[01:25.600 --> 01:30.080]  couple of things that happen in that game that are quite interesting is, for instance,
[01:30.080 --> 01:35.280]  the initiative, right? Who gets to go first? If you win initiative, you're actually not
[01:35.280 --> 01:39.000]  the first to move. And it's actually at a disadvantage for you to move first. And that's
[01:39.000 --> 01:43.600]  sort of similar to what happens in the real world, right? So your attackers make the first
[01:43.600 --> 01:50.100]  move and you have to, you know, you have to react to that. Also, you know, there's stealth
[01:50.100 --> 01:55.080]  involved here, right? So you don't want the enemy to find your ships before you find
[01:55.080 --> 02:01.340]  theirs. So that's also an element of how cybersecurity and the defensive side works,
[02:01.340 --> 02:06.740]  and especially when you're doing threat hunting. And the reconnaissance, right? I mean,
[02:06.740 --> 02:12.140]  there's the, you know, another way of saying that is doing threat hunting. And I was told a
[02:12.140 --> 02:19.720]  few years ago by a cybercom person who worked on the United States NSA that, you know, where
[02:19.720 --> 02:24.540]  did threat hunting come from? It was one of these techniques that was developed during World War II,
[02:24.540 --> 02:29.480]  and maybe it was done before that, but the United States used it in World War II to test their
[02:29.480 --> 02:34.980]  sentry points. So they would have sentry points defending the perimeter of, you know, where their
[02:34.980 --> 02:40.920]  troops were, and you would send your own troops in to test those sentry points to see just how
[02:40.920 --> 02:47.940]  awake and reactive they were. So with that in mind, let's talk about the good old days,
[02:47.940 --> 02:53.260]  right? Traditional forensics environments, what were they like? You know, you had these forensic
[02:53.260 --> 02:58.400]  towers, and you would sit at your desk, and you would, you know, use that forensic tower to
[02:58.400 --> 03:04.760]  connect to your land, which oftentimes in the office you would have access to anything on that
[03:04.760 --> 03:09.920]  land in your office to be able to connect to it and use your forensic tower and the software that's
[03:09.920 --> 03:16.680]  on there to conduct forensics. Or if you end up confiscating hard drives and computers, you would
[03:16.680 --> 03:22.780]  use a write blocker and connect that write blocker to your computer and, you know, examine the
[03:22.780 --> 03:28.220]  evidence there. Also, a lot of the forensic software was pretty much only activated if you
[03:28.220 --> 03:32.520]  used a dongle. So you'd have to connect the dongle to your forensic tower in order to use that.
[03:32.520 --> 03:40.720]  But here we are in the new wise days. Things are different now, right? So host forensics,
[03:40.720 --> 03:43.920]  it's, you know, you're going to have to do it in the cloud now. A lot of the infrastructure,
[03:43.920 --> 03:48.860]  especially on the product side, is moved to a SaaS-based type infrastructure in the cloud.
[03:48.860 --> 03:54.840]  And, you know, you don't have access any longer on the land to everything. There's a lot of,
[03:54.840 --> 04:01.000]  you know, access control lists and security groups, and things might be in multiple VPCs.
[04:01.000 --> 04:06.620]  This makes it a bit of a challenge when you're in the cloud and you're trying to conduct forensics.
[04:06.920 --> 04:11.000]  Also, you need to really think about cloud automation, right? I mean, you're not going
[04:11.000 --> 04:17.760]  to manually be going in there and changing settings or opening security groups. You need
[04:17.760 --> 04:24.260]  to find some way to automate that. And you need to put in continuous integration, continuous
[04:24.260 --> 04:28.780]  deployment into the pipeline for your security defenses, right? That's a really important aspect
[04:28.780 --> 04:33.880]  of this. And this is one area that I'm going to talk about a lot in the second half of this talk.
[04:34.100 --> 04:40.380]  And at the end of the day, security is everyone's problem. It's not no longer, you know,
[04:40.380 --> 04:45.240]  security teams are responsible for that. The DevOps teams, the operations teams, the business,
[04:45.240 --> 04:51.680]  we're all very much need to be aware of what's the best way to implement security.
[04:52.280 --> 04:55.560]  So I want to talk a little bit about the shared responsibility model with cloud providers,
[04:55.560 --> 05:01.560]  especially with AWS. So basically, one way to look at this is like, you know, the orange
[05:01.560 --> 05:09.020]  here is AWS's responsibility. And what divides the orange from the blue is what we call like
[05:09.160 --> 05:14.060]  a management plane. And everything above the management plane is supposed to be the responsibility
[05:14.060 --> 05:23.680]  of the client, right? So here's a snippet here of defining the, you know, for at least the
[05:23.680 --> 05:29.700]  Elastic Compute Cloud infrastructure. They're basically telling you here, you know, you're
[05:29.700 --> 05:34.080]  responsible for making sure that you do the security configuration and the management tasks
[05:34.640 --> 05:40.740]  of those devices. Furthermore, if you look through their documentation, this is something I found
[05:40.740 --> 05:47.540]  regarding how they view digital forensics. If you read through this, it's a bit longer than
[05:47.540 --> 05:53.620]  what I posted here, but they really talk about, you know, logging, being part of their
[05:54.360 --> 05:57.740]  data that they collect for you in terms of digital forensics. They're not really
[05:57.740 --> 06:02.460]  talking about the hosts, right? And in fact, they even go further here and tell you that
[06:02.460 --> 06:06.980]  the responsibility to determine, you know, how the attack happened and the breach and
[06:06.980 --> 06:13.560]  what was compromised is very much your responsibility as a client. And they also
[06:13.560 --> 06:22.120]  go into, you know, briefly kind of cover topics around, you know, how can you, you know, recover
[06:22.120 --> 06:26.820]  from an attack and we're here to help you to do that. But they're not really focused on
[06:28.260 --> 06:33.660]  doing the analysis piece, right? So when you look at the NIST model here, they will help you
[06:33.660 --> 06:39.880]  with detection, right? Because they have a lot of tools out there to help you with that. They
[06:39.880 --> 06:45.200]  might help you with containment because you're going to use their technology to possibly bring
[06:45.200 --> 06:50.560]  something down or terminate it or bring it into a secured location where you can examine the
[06:50.560 --> 06:55.640]  artifact further. And they might help you with eradication and recovery. But they really aren't
[06:56.080 --> 07:00.620]  going to be helping you, at least from a host-based perspective on the analysis piece.
[07:00.620 --> 07:04.760]  And this is really what I want to focus this talk on is, you know, what is it that you could do
[07:04.760 --> 07:12.740]  to help you get better insight into host-based forensics? So I want to develop this principle
[07:12.740 --> 07:19.540]  called Sec DevOps Defense, SecDOD. In order to implement this properly, I mean, you really need
[07:19.540 --> 07:24.540]  to start thinking about how you design your forensic environments. Don't let them be homegrown
[07:24.540 --> 07:29.660]  and kind of evolve on their own. You really need to think about how you should structure your
[07:29.660 --> 07:34.100]  forensic environments, how they fit into the overall plan of your cloud infrastructure.
[07:34.100 --> 07:40.020]  Another big component of that is, especially if you're a mid-size or a large-size company,
[07:40.020 --> 07:43.840]  you want to be orchestrating your forensic environment so that you can meet the demand.
[07:43.840 --> 07:48.100]  So if you have a, you know, a large incident you need to deal with, and you need to provide
[07:48.580 --> 07:51.620]  a lot of infrastructure for your forensic team, you want to be able to do that quickly. You don't
[07:51.620 --> 07:55.480]  want to have to order, you know, more forensic towers, which will come, you know, two weeks
[07:55.480 --> 08:03.000]  from now and be installed in your forensic labs. It just doesn't scale anymore like that. So you
[08:03.000 --> 08:09.240]  really need to think about building, orchestrating forensic environments. You also should, you know,
[08:09.240 --> 08:14.580]  be pen testing those forensic environments from the cloud. That's really important to do and
[08:14.580 --> 08:20.180]  make sure that they're secured in the way that you, you know, architected them and designed them.
[08:20.180 --> 08:24.660]  And one of the things that you should be thinking about is building and thinking about cloud
[08:24.660 --> 08:30.860]  forensic patterns. These might, you know, be really useful for you, the cookie cutter
[08:30.860 --> 08:39.980]  across different cloud environments as you start to grow your forensic environments in your space.
[08:40.820 --> 08:47.300]  So here's a pattern, you know, that I have, I put together. So, you know, this pattern is
[08:47.300 --> 08:52.340]  showing you that the forensic infrastructure is still in the office, right? But, you know,
[08:52.340 --> 08:57.360]  all your SAS has been moved to the cloud. And now the forensics teams are scratching their heads,
[08:57.360 --> 09:02.120]  thinking about how do they, how can they be doing threat hunting or forensics in the cloud
[09:03.300 --> 09:08.180]  when everything's been moved up there and your forensic environments haven't been moved.
[09:08.200 --> 09:13.080]  Yeah, so you really don't want to be in this kind of position. So as you start moving to the,
[09:13.080 --> 09:18.880]  you know, to your SAS based infrastructure, you really want to start thinking about, you know,
[09:18.880 --> 09:23.840]  how, what things need to be done in order for your forensic environments to really connect
[09:24.780 --> 09:28.340]  and be able to, the forensic people to be able to do their job up in the cloud. So
[09:29.000 --> 09:38.220]  I try to design my forensic infrastructure in two different spaces. One is, you know,
[09:38.220 --> 09:44.980]  we have an acquisition infrastructure space where you really use forensic tools there just to do
[09:44.980 --> 09:52.620]  the acquisition of host-based forensics, right? And that should be separated. And I call that
[09:52.620 --> 09:57.780]  corporate gap rather than that error gap. That should be separated from your examiner
[09:57.780 --> 10:02.540]  infrastructure. And the examiner infrastructure may have even more variations of separation,
[10:02.540 --> 10:06.600]  like you might have an infrastructure there that is used for malware analysis, and that
[10:06.600 --> 10:10.700]  might be error gap, right? You don't want that to be spreading and you want that to be very isolated
[10:10.700 --> 10:14.720]  as you examine malware. But your examiner infrastructure and your acquisition infrastructure
[10:14.720 --> 10:20.500]  should be, not have a connection. And one way I solve that problem is, you know, I use cloud
[10:20.500 --> 10:24.420]  services. Now, yes, theoretically they're connected because I'm using cloud services,
[10:24.420 --> 10:31.560]  but there is a degrees of separation there that help you maintain, you know, that integrity when
[10:31.560 --> 10:36.720]  you're examining something and not having the fear that as you examine something it might actually
[10:36.720 --> 10:42.540]  leak out into the corporate network. So orchestration, this is kind of a design
[10:42.540 --> 10:47.280]  pattern on how you might want to build your orchestration. You'll have a virtual environment
[10:47.980 --> 10:55.020]  that, you know, lets you build, lets you deploy on demand for virtual forensic clients.
[10:55.200 --> 11:01.140]  You'll have your forensicators that will log into some centralized controlling application
[11:01.640 --> 11:07.900]  that you built. This application should be tracking inventory. You don't want to exceed
[11:08.460 --> 11:12.840]  the capacity of your infrastructure, your virtual infrastructure, or if you're in the cloud,
[11:12.840 --> 11:18.500]  you don't want to, you know, expand your on-demand to the point where it becomes a
[11:18.500 --> 11:23.240]  budgetary issue. It should be tracking your users. And as I always say, there should always
[11:23.240 --> 11:29.880]  be an approval process built into these kinds of orchestration mechanisms. I don't, you know,
[11:29.880 --> 11:37.080]  I don't, anybody that does forensics or is doing any kind of investigation needs to have an
[11:37.080 --> 11:41.080]  approval to do that. I don't want, you know, we don't want the forensic teams to be connecting to
[11:41.600 --> 11:48.040]  assets without there being some sort of approval or ticket for the work that they're doing.
[11:48.100 --> 11:52.740]  One of the other challenges, as I mentioned earlier, that a lot of the offering works on
[11:52.740 --> 12:00.120]  USB based technology in order to activate it. So you might want to look at a network attached
[12:00.120 --> 12:07.340]  USB solutions for that. Again, there's caveat because you need to make sure that that's not
[12:07.420 --> 12:11.720]  a problem with the software you're using. Some of the vendors might have a licensing violation if
[12:11.720 --> 12:15.900]  you do that. But for the ones that don't, you definitely need, that's one way to solve that
[12:15.900 --> 12:23.000]  issue there. So pipeline challenges with DevOps teams. You really need to, you want to coordinate
[12:23.000 --> 12:27.300]  and collaborate with the DevOps architecture teams. In the past, you know, I think there was
[12:27.400 --> 12:31.900]  a lot of separation between, you know, security teams doing their thing and the DevOps, you know,
[12:31.900 --> 12:36.080]  the development teams are doing their own. You really need to start getting involved with the
[12:36.080 --> 12:41.320]  DevOps teams. I know there were a lot of talks at Black Hat last year regarding there being a need
[12:41.320 --> 12:48.620]  for security teams to make sure they collaborate with architecture teams. You know, the architecture
[12:48.620 --> 12:55.280]  teams are trying to build in engineering solutions, right? And they're really risk-weighted.
[12:55.280 --> 13:00.020]  You know, you want to talk to them and introduce security solutions on a risk-weighted
[13:00.380 --> 13:06.260]  way, right? So that you can, you know, they can understand, you know, in some cases,
[13:06.260 --> 13:12.620]  there may be overhead to be able to implement a security solution, but if the risk is not that
[13:12.620 --> 13:19.580]  high, then maybe you won't, you might bypass on doing that. Whereas other areas of solutions
[13:19.580 --> 13:25.640]  might need to be there because the risk is very high and you want them to adapt those
[13:25.640 --> 13:33.360]  solutions into the overall design of the product. And you really should be doing security
[13:34.500 --> 13:38.860]  design so that you do it for security purposes and not just for because you're trying to meet
[13:38.860 --> 13:45.420]  some kind of compliance requirement. And design into the pipeline. You want to design into the
[13:45.420 --> 13:50.120]  pipeline forensic solutions. Look at the forensic tools. A lot of those tools are actually really
[13:50.120 --> 13:56.240]  foreign to DevOps teams. They haven't ever been exposed to that kind of stuff. And it's, you know,
[13:56.240 --> 14:00.340]  there is a little bit of learning curve for them and the methodology as well. It's, you know,
[14:00.340 --> 14:07.060]  the forensic methodology is very different than the normal DevOps deployment, you know, the way
[14:07.060 --> 14:11.700]  they build software and their development life cycle. It's very different than what forensic
[14:11.700 --> 14:19.200]  methodology is trying to do. And sometimes there's areas of clash there. So you need to work those
[14:19.200 --> 14:26.500]  out with your architecture teams. So what would a forensic tool pipeline look like? You know,
[14:26.500 --> 14:32.140]  really you want to start at the beginning. You want to have a design as architecture teams trying
[14:32.140 --> 14:37.640]  to design some kind of technology. You want to introduce your forensic solutions into that along
[14:37.640 --> 14:42.840]  with just like you would with the other stuff, other operational needs like logging and alerting
[14:42.840 --> 14:48.520]  that need to go into that. You want to be prototyping that. And you want to be able to
[14:48.520 --> 14:53.700]  feed that back as a feedback loop into the design process until you get something stable.
[14:53.860 --> 14:59.540]  At that point, you'll want to look at some automation. In this case, I'm showing you
[14:59.540 --> 15:06.580]  Terraform where you'll as you build this solution, you might execute some automation tools with
[15:06.580 --> 15:12.980]  Terraform or with Packer in order to achieve the goal that you're trying to do here. And you want
[15:12.980 --> 15:19.780]  to test these security solutions and validate them throughout the development life cycle here
[15:19.780 --> 15:25.200]  in the test environments and the dev environments. And finally, you know, hopefully you get something
[15:25.200 --> 15:31.920]  really solid that you can deploy to production and your threat hunt team is able to do the thing
[15:31.920 --> 15:40.280]  they need to do, which is threat hunt. Okay, so baking process, you know, you want to bake into
[15:40.940 --> 15:47.620]  the Amazon machine images creation process. You want to bake in all your tools that the
[15:48.580 --> 15:55.360]  forensics team is asking to be put in there. You know, tools to be able to examine file systems
[15:55.360 --> 16:02.280]  and do memory captures. Some of these tools, you know, you can use open source. Some are,
[16:02.280 --> 16:06.000]  you know, you might need to purchase them commercially. That all depends on what you've
[16:06.000 --> 16:11.380]  kind of decided to do in your organization. But basically all these tools should be incorporated
[16:11.380 --> 16:15.060]  into the overall design of the base image. And the base image is what should really be getting
[16:15.060 --> 16:21.660]  deployed with all those tools on them out to your out to your infrastructure. All right. So
[16:22.540 --> 16:29.680]  one area that also you need to think about is from a network perspective, you want to use
[16:30.320 --> 16:38.100]  tools like Terraform. Terraform kind of sits out on your perimeter of your account. And it is a way
[16:38.100 --> 16:44.580]  for you to automate making changes to, let's say, for instance, security groups. Like in this
[16:44.580 --> 16:51.760]  example, you know, if I'm using F response, I don't want to have to bother the operations teams
[16:51.760 --> 16:59.460]  or the DevOps teams. And, you know, if we want to go and use F response to examine a particular node,
[17:00.600 --> 17:06.120]  we don't really want to be bothering them on a one-to-one basis, on a one-offs to get access to
[17:06.120 --> 17:12.220]  these hosts. You want to be using solutions like Terraform where the perimeter already gets set up
[17:12.220 --> 17:17.060]  as a continuous availability back to your forensic acquisition infrastructure, right? So whatever
[17:17.060 --> 17:21.140]  tools you're using and whatever ports they need to connect to, you want to, you know, make sure
[17:21.140 --> 17:27.420]  that that's a very narrow subset of rules. But yet the rules need to be in place for you to do
[17:27.420 --> 17:32.320]  in order for your forensics teams or threat hunting teams to have continuous availability to
[17:32.320 --> 17:38.680]  what's being deployed out in the cloud. All right. So I want to go into a use case here
[17:39.440 --> 17:44.580]  and walk you through how, you know, two things I'll talk about here is software development
[17:44.580 --> 17:49.740]  lifecycle and how that might work and the experience we had with this particular use
[17:49.740 --> 17:54.540]  case and also on how this actually made a big difference between what I was talking about
[17:54.540 --> 17:59.600]  earlier about the shared responsibility model. Like everything below the hypervisor is, you know,
[17:59.600 --> 18:04.160]  something that the cloud provider is responsible for and everything above it you're responsible.
[18:04.160 --> 18:08.460]  But sometimes the lines are a little bit blurry and I'll show you here. And I think there was a
[18:08.460 --> 18:16.440]  big success here as a result of this and forensic community is going to be quite fortunate to be
[18:16.440 --> 18:21.300]  able to have the solution in place as we encounter some issues here. So the architecture team,
[18:21.300 --> 18:27.580]  you know, what do they want to do? They wanted to drive adoption of a new EC2 instance, right?
[18:27.580 --> 18:34.000]  They wanted to move from Zen architecture to a Nitro architecture at the hypervisor level.
[18:34.000 --> 18:43.400]  They wanted to change the operating system from CentOS to Amazon Linux 2. And they wanted to,
[18:43.960 --> 18:49.820]  you know, they really didn't have a choice. Nitro, you know, defaults to XFS file system.
[18:49.820 --> 18:56.700]  We used to be using Xtent before, but now we're going to be using XFS. And the extended file
[18:56.700 --> 19:02.300]  system, they didn't want to build any partitions into that. It was easier for them to operate
[19:02.300 --> 19:09.060]  expanding the partition sizes on demand. So this is basically, you know, this was their requirements.
[19:09.480 --> 19:14.040]  Let me just go over some of this technology so it makes sense to all of you. So, you know,
[19:14.040 --> 19:21.300]  AWS's hypervisor technology, originally they had come out with this technology called Zen.
[19:21.380 --> 19:26.400]  It really helped manage the hypervisor infrastructure from a software point of view.
[19:26.900 --> 19:33.120]  And Nitro kind of changed that a little bit. They put a lot of what was being managed at the
[19:33.120 --> 19:41.020]  software level down into a lot of hardware components. And I found this visualization
[19:41.840 --> 19:45.620]  from Brendan Gregg. It's an interesting article to read about it if you want to
[19:45.620 --> 19:50.880]  deep dive more into this the different types that they have and how they work.
[19:51.640 --> 19:58.180]  So what is Nitro in a nutshell? It's basically what they've done is they've developed a
[19:58.180 --> 20:03.300]  controller security chip. And they really, you know, if you watch this YouTube video,
[20:03.300 --> 20:07.720]  it really goes into depth about the security benefits of Nitro. But basically what it comes
[20:07.720 --> 20:15.800]  down to is that they are creating this security layer between the hypervisor and the hardware.
[20:15.980 --> 20:21.360]  And everything that needs to communicate between each other is basically taking place in this
[20:21.360 --> 20:25.480]  security chip. And for various reasons that they'll explain if you watch this video, why
[20:26.080 --> 20:31.240]  this is really important to them from a security point of view, it's relevant to them, right?
[20:31.240 --> 20:39.580]  Not really exposing security to you as a client. It's not a security feature for you as a client,
[20:39.580 --> 20:44.280]  but just, you know, you'll know that if you use Nitro, it's more secure than Xen, right?
[20:45.180 --> 20:54.700]  The other thing here is what is AL2? AL2 is Amazon's Linux 2. We've been using CentOS,
[20:54.700 --> 20:58.240]  but we were going to adopt Amazon Linux 2. There's a couple of reasons for that. There's
[20:58.240 --> 21:03.760]  benefits for that. It's really tuned at the kernel level for their infrastructure.
[21:04.620 --> 21:12.260]  They're going to support, you know, these core packages on a continuous basis. And they also
[21:12.880 --> 21:19.720]  say that it's going to be a lot less expensive than running other types of instances. So,
[21:19.720 --> 21:25.220]  you know, obviously it's attractive to the business to have a lower cost solution in place.
[21:25.940 --> 21:30.940]  File system types, you know, Extent4 has been around for a while.
[21:31.680 --> 21:37.600]  It's, you know, until recently, I think the Linux community has decided that even at the CentOS
[21:40.260 --> 21:48.060]  level, they're going to move to an XFS-based file system. Nitro exclusively uses XFS.
[21:49.060 --> 21:54.400]  XFS is a file system that was invented by Silicon Graphics years ago,
[21:54.400 --> 22:00.840]  many years ago, and it's really good at handling large files. So, you know,
[22:01.340 --> 22:09.480]  Amazon's decided that they want to move in this direction. I posted here some blogs from Hal
[22:09.480 --> 22:15.820]  Parmaraz. He's a Linux forensics guru, really doing a lot of good research in this front.
[22:16.520 --> 22:19.960]  If you want to learn more about some of the challenges here, and he talks about it in this
[22:19.960 --> 22:24.520]  blog about, you know, well, you know, for forensics, you're going to have to be faced
[22:24.520 --> 22:31.160]  with dealing with XFS on a more common basis, and he goes into a lot of the details about,
[22:31.840 --> 22:40.640]  you know, how XFS works and why it's being adopted into Linux systems. Really highly
[22:40.640 --> 22:47.640]  recommend reading that. So what are the needs of the forensics team, right? The architecture team
[22:48.240 --> 22:53.460]  is thinking, hey, I want to build this new EC2 instance and start using some
[22:54.000 --> 22:57.420]  different technologies that the cloud provider was using. So you want to introduce your
[22:57.420 --> 23:01.740]  requirements, right? So our requirements are, you know, we want to be able to do memory
[23:01.740 --> 23:07.900]  acquisitions on the host. We want to be able to perform quick triages of the instances, like,
[23:07.900 --> 23:14.840]  you know, on demand, be able to execute network connections that might be occurring on the box,
[23:14.840 --> 23:19.480]  and we also want to do host acquisitions of the assets in the cloud. So I'm going to talk a
[23:19.480 --> 23:25.840]  little bit about the first and the third. The second one, due to time and so forth, I don't,
[23:25.840 --> 23:30.920]  I'm not going to cover here, but maybe I'll do that at some other talk in the future.
[23:30.920 --> 23:34.800]  So memory, from a memory perspective, there's really, on the Linux side, there's really
[23:34.800 --> 23:46.260]  two really good tools, LIME and AVML. You know, we went with AVML. It seemed to work better for us
[23:46.260 --> 23:52.040]  on the AL2 side of things. We had been using LIME for the CentOS infrastructure.
[23:52.460 --> 23:57.620]  Again, here's an example of where, you know, the architecture team's changing something,
[23:57.620 --> 24:03.300]  but I got to go and adapt my security tools, right? So we found that AVML was a better fit
[24:03.300 --> 24:11.840]  for AL2. Some of the challenges with this is as you go and patch your AL2 instances, you're going
[24:11.840 --> 24:19.320]  to have to rebuild the AVML in real time in order to keep that profile up to date. It won't work if
[24:19.320 --> 24:27.120]  the kernel version patch is in a different profile update than the one you had previously. So that's
[24:27.120 --> 24:32.720]  something to keep in mind. You want to build that into your automation process as you build, as you
[24:32.720 --> 24:36.960]  go ahead and do the patching of these hosts. And also, you know, you don't want to be dumping
[24:37.540 --> 24:41.940]  memory on the host that you're collecting the memory from. So you want to find a way to do
[24:41.940 --> 24:50.200]  that as a remote copy, right? This is, you know, basically it's pretty simple to run this command.
[24:50.300 --> 24:54.660]  One of the things we found is we decided not to use the compression option. For some reason it
[24:54.660 --> 25:04.700]  had problems running with the volatility. And we used Packer to automate the builds of our hosts.
[25:04.940 --> 25:08.840]  I'll show you an example here of a Packer script that we're running
[25:09.980 --> 25:24.230]  in order to package and build the AVML into our base image. And again, I'll post these slides up
[25:24.230 --> 25:32.550]  online later today. So host-based forensics, what we're thinking about here is, you know,
[25:32.550 --> 25:40.010]  there's various different tools for doing acquisition forensics. F-Response and X-Ways
[25:40.010 --> 25:47.430]  Imager. And these, you know, four of these are commercial tools. DD is obviously free. It's
[25:47.430 --> 25:54.210]  native to Linux. We went with, we started looking at, and I'll show you in this use case where,
[25:54.210 --> 26:01.110]  looking at F-Response and DD to encountering some problems when we're doing host-based forensics on
[26:02.570 --> 26:08.230]  the Nitro infrastructure. We wouldn't have expected to run into a problem, but we did. So
[26:09.870 --> 26:15.810]  again, here is me, you know, running the DD command and how you might be able to,
[26:15.810 --> 26:19.270]  here's another pattern, right? Like, you know, think about these patterns and
[26:19.710 --> 26:25.230]  how can you run the DD command so that it doesn't drop the image on the host that you're trying to
[26:25.230 --> 26:31.790]  do, you know, collect a byte by byte image of that machine. So this is one way to do it. You
[26:31.790 --> 26:38.070]  can pipe it through SSH to out to another host and drop the image there and then maybe get out
[26:38.070 --> 26:47.270]  to an S3 bucket and out to your examiner infrastructure. So what is F-Response?
[26:47.950 --> 26:52.650]  Particular product we're using is called Universal. It supports Windows, Linux, and Mac
[26:53.390 --> 26:58.610]  forensics acquisitions. It also has this other feature set here, it looks like,
[26:58.610 --> 27:03.590]  for some other cloud-based services that you can do forensics with. I haven't even tried,
[27:03.590 --> 27:07.730]  I haven't even started using that yet, but just wanted to show it here. It's a lot more than just
[27:08.370 --> 27:13.990]  for host-based forensics. They're moving into doing cloud-based forensics as well.
[27:16.570 --> 27:20.530]  So we went ahead and we started to do our testing, right? So we're going to test Nitro
[27:20.530 --> 27:28.970]  with F-Response, right? So here's a screenshot of what it looks like in F-Response. We see the
[27:28.970 --> 27:38.410]  different partitions, and we know that the root file system is on NVMe0N1P01, right? So we go
[27:38.410 --> 27:44.370]  ahead and take an acquisition of that. Everything seems fine. Everything seems to be working well.
[27:44.370 --> 27:49.950]  We get a successful image, no errors. But when I bring it up in X-Ways, which is the
[27:49.950 --> 27:56.070]  forensics tool that I use to examine the image that was created, the 01, we're getting these
[27:56.070 --> 27:59.210]  errors, right? So we start getting these errors. We're like, this is weird. I mean, I don't have
[27:59.210 --> 28:07.590]  this problem with Xen, right? I take an image of the device in Xen architecture, and X-Ways looks
[28:07.590 --> 28:11.190]  at it fine, and we're not really sure what the problem is here, right? This is in the early
[28:11.190 --> 28:17.270]  stages of our, you know, beta prototype testing, right? And, you know, what we're getting here are
[28:17.270 --> 28:23.190]  some mis-size mismatch errors, right? And it complains about, you know, the sector size being
[28:23.190 --> 28:28.230]  less than would be expected at the boot sector. So, you know, that was really odd. Went back and
[28:28.230 --> 28:33.850]  forth with X-Ways and F-Response to try and understand what was going on here. You know,
[28:33.850 --> 28:39.950]  X-Ways, this is one of the few products that supports XFS file system from an examiner point
[28:39.950 --> 28:46.310]  of view. And what happens here is it actually brings up part of the file system, but as you
[28:46.310 --> 28:51.690]  can see there, the magic number identifies it as XFS, but it didn't bring up the whole root file
[28:51.690 --> 28:57.430]  system, right? So parts of it are missing. So it was able to recover part of it. Still not ideal,
[28:57.430 --> 29:03.570]  right? I want to get the whole thing. And so, you know, we're working here with F-Response
[29:04.690 --> 29:10.290]  troubleshooting and more things popping up here that look kind of strange to them and us. And
[29:10.290 --> 29:15.930]  that's, you know, we're running this command line here that they have that kind of, I guess, it
[29:15.930 --> 29:22.690]  reads the block counts for their tool. And when we looked at the block counts in the product
[29:22.690 --> 29:27.670]  partitions file, things didn't seem to match up here. So we were thinking, and, you know, if you
[29:27.670 --> 29:32.590]  think about it, that X-Ways error is talking about size mismatch. So we're thinking, you know,
[29:32.590 --> 29:37.830]  something's going on here and we're not able to get the proper block counts to make a good image,
[29:37.830 --> 29:43.830]  right? So maybe that's the problem. So one of the things that we did is I said, you know what,
[29:43.830 --> 29:53.950]  let's go ahead and do, let's take a DD image of this, of these devices. And there's another tool,
[29:53.950 --> 29:58.790]  it's an open source tool, it's called Sleuth Kit. It kind of does the same thing X-Ways does,
[29:58.790 --> 30:05.250]  but this one's open source and it's free and it lets you examine the file system. But as you can
[30:05.250 --> 30:10.310]  see here, you know, at first we were like, okay, let's use Sleuth Kit, but Sleuth Kit doesn't,
[30:10.310 --> 30:16.450]  out of the box, the main branch of it right now doesn't support XFS. So, you know, we can take
[30:16.450 --> 30:22.110]  an image and it's not getting us too far in terms of being able to examine it. But, you know, after
[30:22.110 --> 30:29.130]  some conversations with Hal, he informed us that there is actually a fork, an XFS supported fork.
[30:29.170 --> 30:34.290]  It's not, at least since we last checked a few weeks ago, it wasn't built into the main
[30:35.050 --> 30:39.650]  distribution of Sleuth Kit, but at some point it will be. But here's a link for you to go out
[30:39.650 --> 30:46.910]  to GitHub and get that if you want to use Sleuth Kit in order to do the forensics.
[30:46.910 --> 30:53.930]  But, you know, as you can see here, we created a DD image and we had no problem using the Sleuth
[30:53.930 --> 31:00.530]  Kit to look at the device, right, in the file system. So, that kind of led me to think, you
[31:00.530 --> 31:07.210]  know, this is not a problem with XFS. I also, we went ahead and took the DD images and loaded
[31:07.210 --> 31:16.650]  them in X ways and they worked fine. We were able to view the image here. So, what's going
[31:16.650 --> 31:24.890]  on here, right? So, we noticed that at least with the version that we originally had with
[31:24.890 --> 31:31.770]  fResponse, that we were only seeing the partitions and not the disks. So, as you can see here,
[31:31.770 --> 31:38.250]  we're running the LSBLK command. It's on Linux and it lets you see the disks and partitions
[31:38.250 --> 31:46.050]  and the sizes. And we're not seeing NVMe 0 M1 in the fResponse console. And that was kind of
[31:46.050 --> 31:51.810]  odd. We're thinking, well, what's going on here? So, we went ahead and communicated that back to
[31:52.990 --> 32:00.570]  the vendor. And they came back with a beta version, a patch version. And they started to identify,
[32:00.570 --> 32:03.430]  you know, understand maybe what is the problem here. Because we've been working with them now
[32:03.430 --> 32:08.670]  for quite a while on this issue. And with their patch version, I'm able to bring up
[32:08.670 --> 32:14.810]  the disks, right? Not just see the partitions. And we took an image of the disks and
[32:15.350 --> 32:21.130]  brought them up in X ways and everything works fine. So, you know, I think this is a big
[32:22.250 --> 32:28.250]  win, a big success for the community, for the forensic community. We've talked to fResponse.
[32:28.250 --> 32:34.570]  They told us in their next release, they're going to provide this patch and, you know,
[32:34.570 --> 32:39.410]  they're going to be able to support Nitro. And that's part of my point here, right? So,
[32:39.410 --> 32:43.170]  you know, but before I get into that, let me just show you what the problem was, right?
[32:43.170 --> 32:48.330]  So, in the meantime, we're also talking to Amazon about this issue and have support tickets open
[32:48.330 --> 32:52.470]  with them, trying to figure out what's going on here. So, in their Nitro architecture,
[32:52.470 --> 32:57.990]  they went and they're starting to use NVMe devices, right? And they have this particular
[32:57.990 --> 33:04.270]  naming pattern that they use here, as you can see here. And this naming pattern was not being
[33:04.270 --> 33:12.430]  picked up by fResponse. They were able to figure that out. And now their tool is picking up the
[33:12.430 --> 33:18.510]  disks when you run their agent on the computer. And I provided a couple of links here if you want
[33:18.510 --> 33:22.650]  to learn a lot more about it. If you're in forensics, I would recommend you start learning
[33:22.650 --> 33:28.790]  and understanding NVMe devices. It's something relatively new, but they're becoming more common,
[33:28.790 --> 33:35.050]  even in laptops and computers that are more physical in nature and less in cloud.
[33:36.530 --> 33:41.130]  So, I just want to mention a few things. I think, you know, I've read a lot of research about,
[33:41.490 --> 33:47.110]  you know, how can you do forensics up in the cloud? And, you know, one of the ways that,
[33:48.130 --> 33:52.890]  you know, it's been communicated a lot is, you know, you can use this concept of snapshots.
[33:53.410 --> 33:57.890]  But I think that, you know, that doesn't really scale well. I think that it's, first of all,
[33:57.890 --> 34:04.770]  if the server's large, it's going to be slow to do that. And, you know, there's a lot of manual
[34:04.770 --> 34:08.610]  steps and you have to get, you know, you're going to have to get the operations teams involved and
[34:08.610 --> 34:15.410]  help you mount those snapshots and give you an environment where they, you know, you can work
[34:15.410 --> 34:21.690]  as a forensics person with those file systems. I just feel like, you know, there's better ways
[34:21.690 --> 34:26.310]  to do that. You know, as I show them one way, you know, we have a specific tool set in our
[34:26.310 --> 34:34.030]  environment that we use. It's easy. It's quite straightforward to do that. But I'm not saying
[34:34.030 --> 34:37.870]  that, you know, snapshots isn't something you won't do. Maybe there's a need out there to do
[34:37.870 --> 34:41.930]  it that way. The other thing, too, is LIME. We're going to do a lot of the complications
[34:41.930 --> 34:48.030]  trying to get LIME to work on AL2. I don't recommend that. Just stick with MVML. That
[34:48.030 --> 34:53.390]  might work better for you. And, you know, this wouldn't be a forensic presentation if I didn't
[34:53.390 --> 34:59.250]  have a lessons learned slide. You know, a couple of things to think about here, right? Build into
[34:59.250 --> 35:05.970]  your software development life cycle forensic solutions. I think you have to do that early on.
[35:05.970 --> 35:13.290]  It's really important to coordinate and collaborate with your DevOps teams, as you saw here.
[35:13.290 --> 35:19.550]  Even before we launched, you know, hundreds of Nitro machines out there, we took the time to
[35:20.030 --> 35:25.790]  understand what impact it might have with our forensics tools and solve that problem before we
[35:25.790 --> 35:31.690]  started really, you know, shooting these out to all of our applications and replacing our Zens
[35:31.690 --> 35:35.950]  with our Nitros. You want to test early. You got to catch these issues. You're going to run
[35:35.950 --> 35:43.190]  into issues. But this is particularly interesting, right? So, you know, this particular issue,
[35:43.190 --> 35:47.810]  right? If you think about the shared responsibility model, you wouldn't think this would have an
[35:47.810 --> 35:51.630]  impact, right? And originally, you know, the architecture team is like, we're just going to
[35:51.630 --> 35:57.270]  go to Nitro. It's just another Linux environment. But the things that are happening below the
[35:57.270 --> 36:02.410]  hypervisor actually may have a significant impact with the way you run your security operations.
[36:02.410 --> 36:07.670]  And it's something that you don't want to be finding this out later. When there is an incident
[36:07.670 --> 36:13.310]  and, you know, a bunch of Nitro machines have been deployed, and it might be difficult for you,
[36:13.310 --> 36:17.410]  or in this case, it wouldn't be possible to use F-Response. You might have to go the alternate
[36:17.410 --> 36:23.090]  way of doing that forensic collections using DD. But that's a lot of manual steps, and
[36:24.110 --> 36:30.970]  it's not so easy to automate using those sorts of tools. You know, automation is really
[36:30.970 --> 36:35.890]  important here as a lessons learned. You want to build in that availability so that I can,
[36:35.890 --> 36:40.850]  my team can use F-Response in order to get to these machines in a way that I don't need to
[36:40.850 --> 36:47.830]  involve, you know, the operations team. I don't need to, you know, we don't need to bother the
[36:47.830 --> 36:54.330]  business, and we can on-demand connect to any machine up in our cloud if we have to do a memory
[36:54.330 --> 36:59.890]  dump and then look at that memory dump, or if we have to do forensics on a host. Very quickly,
[36:59.890 --> 37:05.010]  I would say within 15 minutes, we can connect to a machine, create an image, get the image
[37:05.010 --> 37:09.790]  into our acquisition infrastructure, and then five minutes later, it's in our examiner
[37:09.790 --> 37:16.250]  infrastructure, and I'm using X-Ways to start looking at, you know, what might have happened
[37:16.250 --> 37:22.770]  to that machine, whether it was prompted by some alert that the SOC team got or something like
[37:22.770 --> 37:28.510]  that. So within 15 to 20 minutes, we're able to do, you know, safe threat hunting. We're not
[37:28.510 --> 37:32.810]  bothering anybody. We're not bringing down systems. We're not making copies of systems. Now,
[37:32.810 --> 37:39.630]  you know, that gives us a tremendous amount of flexibility here. Just a couple of shout-outs
[37:39.630 --> 37:47.070]  here. Cloud Village team, thank you for letting me do this presentation. I also, a bunch of
[37:47.070 --> 37:51.870]  colleagues here that helped me with this, and were involved in this, and F-Response as well,
[37:51.870 --> 37:58.230]  for coming out with the patch. I just want to thank you. I'm going to post a blog about this
[37:58.230 --> 38:04.170]  in a written form on my blog here, and if you have any questions, please feel free to email me.
[38:05.250 --> 38:12.370]  Hey, Michael, quick question for me. Have you seen anything similar in terms of the pipeline
[38:12.370 --> 38:19.230]  that you built, or the tool sets, or just any other, you know, researcher implementing
[38:19.230 --> 38:26.250]  something like this for any of the other big clouds? Yeah, I haven't seen anything like that,
[38:26.250 --> 38:30.130]  but then again, I don't, I mean, we have, I have a little bit of experience in Azure,
[38:30.130 --> 38:38.250]  just because we're also using that cloud service, but not with a Google Cloud either. We don't have
[38:38.250 --> 38:44.150]  much experience in there. So, yeah, my answer is I'm not really sure. Got it, because I'm assuming
[38:44.150 --> 38:50.030]  that it would be a little bit different because, you know, AWS uses their own proprietary hypervisor
[38:50.030 --> 38:56.310]  and all of that. So, yeah, yeah, absolutely. So, that's, I think that's the, you know,
[38:56.310 --> 39:01.130]  the important thing here is like, in order for them to scale economically, right, they're building
[39:01.130 --> 39:05.270]  their own hardware, right? Their own hardware, you know, this is not VMware that's being deployed
[39:05.270 --> 39:11.150]  there. And as they do that, you know, there's a concept of, you know, hey, that's happening below
[39:11.150 --> 39:16.930]  the management plane, and you don't need to worry about it. But in reality, when you start, you know,
[39:16.930 --> 39:21.310]  looking at it like at a forensic level, right, when you're really examining at the device level,
[39:21.310 --> 39:25.770]  that actually might actually turn out to be important. And if you don't do this beta testing,
[39:25.770 --> 39:29.490]  if you don't do this prototyping, before you deploy a fleet of machines on this new hypervisor
[39:29.490 --> 39:33.350]  technology, and you have a real incident, you're going to be struggling trying to do
[39:33.790 --> 39:36.730]  forensics or threat hunting in a quick manner.
